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Method and device for the detection of points of interest in a digital source, 
corresponding computer program and data carrier 

1 . Field of the invention 

The field of the invention is that of the detection of points of interest, also 
called salient points, in a digital image. More specifically, the invention relates to 
a technique for the detection of points of interest implementing a wavelet-type 
approach. 

A point of interest may be considered to be the representative of a spatial 
region of the image conveying a substantial portion of information. 

Historically, the notion of the salient point has been proposed in the field 
of computer vision, where one of the major problems consists of the detection of 
the corners of the objects (whence the term "salient" used here below as a 
synonym for the term "of interest"). This term was subsequently broadened to 
include other characteristics of images such as contours, junctions etc. 

In image processing, the detection of the salient points corresponding to 
the comers of the objects is of little interest. Indeed, the corners are generally 
isolated points, representing only a small part of the information contained in the 
image. Furthermore, their detection generates heaps of salient points in the 
textured or noisy regions. 

Various other techniques have been proposed, relating especially to the 
salient points corresponding to the high frequency zones, namely to the contours 
of the objects. The invention can be applied more specifically to this type of 
technique. 

A more detailed description is given here below of the different techniques 
for the detection of salient points. 

2. Prior art 

The detection of salient points (also called points of interest) in images is a 
problem that has given rise to much research for many years. This section 
presents the main approaches classically used in the literature. Reference may be 
made to the document [5] (the documents referred to are listed together in the 
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appendix B) for a more detailed review of the prior art. 

One of the first methods was proposed by Harris and Stephens [7] for the 
detection of corners. Points of this type were deemed then to convey a major 
quantity of information and were applied in the field of computer vision. 

To define this detector, the following quantity is defined at each point 
p(x,y) of the image / : 

R Xiy =Det(M xy )-kTr(M xy ) 2 



where M x>y is a matrix defined by: 

i x (x>y)i y (x>y) il(*>y) 



M xy =G(a)® 
10 where: 



❖ G(a) denotes a Gaussian kernel with variance a 2 ; 

❖ ® denotes the convolution product; 

❖ I x (resp. Iy) denotes the first derivative of / following the direction x 
(resp. y) ; 

1 5 ❖ Det( M Xty ) denotes the determinant of the matrix M x>y ; 

❖ Tr(M x y ) denotes the trace of the matrix M Xty ; 

❖ k is a constant generally used with a value of 0.04. 

The salient points are then defined by the positive local extreme values of 

the quantity R^y. 

20 In [5], the authors also propose a more precise version of the Harris and 

Stephens detector. This version replaces the computation of the derivatives of the 
image / by a precise computation of the Gaussian kernel. 

The Harris and Stephens detector presented here above has been extended 
to the case of color pictures in [6]. To do this, the authors extend the definition of 

25 the matrix M x%y which then becomes: 



M xy =G(a)<8> 
where: 



(R 2 X + G] + B 2 x )(x,y) (RR + GG + BB)(x,y) 



_(R x R y + G x G y + B x B y )(x t y) (R 2 y + G 2 + B 2 y )(x,y) 
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❖ RjoG^Bjc respectively denote the first derivatives of the red, green and 
blue colorimetrical planes in the direction x ; 

❖ Ry,Gy,B y respectively denote the first derivatives of the red, green and 

blue colorimetrical planes in the direction y ; 
5 In [10], the authors consider the salient points to be the points of the image 

showing high contrast. To build a detector of this kind, the authors use a 

multiple-resolution approach based on the construction of a Gaussian pyramid. 

Let it be assumed that the image / has a size 2 N x2 N . We can define a 
pyramid with AT levels where the level 0 corresponds to the original image and the 
10 level N-l corresponds to a one-pixel image. 

At the level k of the pyramid, the contrast of the point P is defined by: 

C*(^)=§tS with 0<k<N-l and Cn(P)=\ 
Bk(P) 

where GrfP) defines the local luminance at the point P and at the level k, and 
BrfP) defines the luminance of the local background at the point P and at the level 
15 k. 

These two variables are computed at each point and for each level of the 
pyramid. They can therefore be represented by two pyramids called a luminance 
pyramid and a background pyramid and defined by: 

M€FUs(P) 

B k (P)= YW(Q)G k JQ) 

QeParent(P) 

20 where: 

❖ The notations Offspring (P) and Parent(P) denote the hierarchical 
relationships in the Gaussian pyramid; 

❖ w is a standardized weight function that can be adjusted in order to 
simulate the Gaussian pyramid; 

25 ❖ W is a standardized weight function taking account of the way in which 

P is used to build a luminance of its ancestors in the pyramid. 
In this approach, a salient point is a point characterized by a high value of 
the local contrast. In order to take account of the non-symmetry of the variable 
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Q, the authors introduce a new variable in order to obtain a zero value for a 
situation of non-contrast and a value > 0 everywhere else. 
This new variable is defined by: 

cur) = jJftw-A^ftw-v^ 

I B k (P) 255 -B k (P) ) 

5 With this new variable, the salient points are defined by the local 

maximum values of C[ greater than a fixed threshold. 

The salient points detector initially presented in [11] is doubtless the 
closest to the present invention since it is also based on the use of the theory of 
wavelets. Indeed, it is the view of the authors that the points conveying a major 
10 part of the information are localized in the regions of the image having high 
frequencies. 

By using wavelets with compact carriers, the authors are capable of 
determining the set of points of the signal /(assumed for the time being to be one- 
dimensional) that were used to compute any wavelet coefficient 
15 whatsoever Z> 2J f (n), and can do so at any resolution whatsoever 2 J (j < -I) . 

On the basis of this observation, the hierarchy of wavelet coefficients is 
built. For each resolution level and for each wavelet coefficient D s f (n ) of this 

level, this hierarchy determines the set of wavelet coefficients of the immediately 
higher level of resolution 2 y+1 necessary to compute D^f (n ) : 

20 C(D 2j f(n)) = {D 2j «f(k),2n <k<2n + 2p-\\Q<n< VN 

where p denotes the regularity of the wavelet base used (i.e. the size of the 

wavelet filter) and AT denotes the length of the original signal /. 

Thus, each wavelet coefficient D 2j f(n) is computed from 2~ J p points of 

the signal / Its offspring coefficients C(D 2J f(n)) give the variation of a subset 
25 of these 2~ j p points. The most salient subset is the one whose wavelet coefficient 
is the maximum (in absolute value) at the resolution level 2 J+l . 

This coefficient therefore needs to be considered at this level of resolution. 
By applying this process recursively, a coefficient D rX f(n)\s selected with the 
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resolution -. This coefficient represents 2p points of the signal / To select the 

corresponding salient point in / the authors propose to choose that point, among 

these 2p points, whose gradient is the maximum in terms of absolute value. 

To extend this approach to the 2D signals constituted by the images, the 

5 authors apply the same approach to each of the three subbands 
D l 2 jl ,D*jl >D\jl where / denotes the original image. In the case of the images, the 

spatial carrier of the wavelet base is sized 2px2p. Thus, the cardinal of 
C(D s 2j f(x,y))is 4p 2 for any s=l f 2,3. For each orientation (horizontal, vertical 

and oblique), the method makes a search, among the offspring coefficients of a 
10 given coefficient, for the one whose amplitude is the maximum. If different 
coefficients of different orientations lead to the same pixel of /, then this pixel is 
considered to be a salient point. 

This technique has been used especially in image indexation in [9]. 
3. Drawbacks of the Prior Art 
15 As shown in the previous section, many methods have been proposed in 

the literature for the detection of salient points. 

The major difference between these approaches relies on the very 
definition of a salient point. Historically, researchers in the field of computer 
vision have devoted attention to the corners of objects. It is thus that the Harris 
20 and Stephens detector [7] was proposed. This detector has recently been extended 
to color in [6]. The corners of objects do not, however, represent any relevant 
information in the field of image processing. Indeed, in the case of weakly 
textured images, these dots will be scattered in space and will not give any 
satisfactory representation of the image. In the case of textured or noisy images, 
25 the dots will all be concentrated in the textures and within a local and non- 
comprehensive representation of the image. 

The definition of contrast-based salience [10] is appreciably more 
interesting for image processing. Unfortunately, this approach suffers from the 
same defect as the previous one in the case of textured or noisy regions. 
30 ❖ The wavelet-based approach proposed by E. Loupias and N. Sebe [11] is 
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clearly the most robust and most worthwhile approach. Indeed, it has long 
been known that the contours represent the primary information of an 
image since it perfectly matches the human visual system. 
4. Goals and characteristics of the invention 
5 It is therefore a particular aim of the invention to overcome the different 

drawbacks of the prior art. 

More specifically, it is an aim of the invention to provide a technique for 
the detection of salient points corresponding to a high frequency, and giving 
preference to no particular direction in the image. 
10 It is another aim of the invention to provide such a technique that calls for 

a reduced number of operations as compared with prior art techniques. 

In particular, it is a goal of the invention to provide a technique of this kind 
enabling the use of wavelet bases with a large-sized carrier. 

These goals, as well as others that shall appear more clearly here below, 
15 are achieved by means of a method for the detection of points of interest in a 
source digital image, said method implementing a wavelet transformation 
associating a sub-sampled image, called a scale image, with a source image, and 
wavelet coefficients corresponding to at least one detail image, for at least one 
level of decomposition, a point of interest being a point associated with a region 
20 of the image showing high frequencies. 

According to the invention, this method comprises the following steps: 

the application of said wavelet transformation to said source 
image; 

the construction of a unique tree structure from the wavelet 
25 coefficients of each of said detail images; 

the selection of at least one point of interest by analysis of said 
tree structure. 

In the present document, for the sake of simplification, the term "source 
image" is applied to an original image or an image having undergone pre- 
30 processing (gradient computations, change of colorimetrical space etc.). 
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Advantageously, for each level of decomposition, at least two detail 
images, respectively corresponding to at least two directions predetermined by 
said wavelet transformation, are determined. 

This wavelet transformation may use especially first-generation or second- 
5 generation (mesh-based) wavelets. 

In particular, the detail images may comprise: 

- a detail image representing the vertical high frequencies; 

- a detail image representing the horizontal high frequencies; 

- a detail image representing the diagonal high frequencies. 

10 Advantageously, the method of the invention comprises a step for merging 

the coefficients of said detail images so as not to give preference to any direction 
of said source image. 

Advantageously, said step for the construction of a tree structure relies on 
a zerotree type of approach. 

15 Thus, preferably, each point of the scale image having minimum resolution 

is the root of a tree with which is associated at least one offspring node 
respectively formed by each of the wavelet coefficients of each of said detail 
image or images localized at the same position, and then recursively, four 
offspring nodes are associated with each offspring node of a given level of 

20 resolution, these four associated offspring nodes being formed by the wavelet 
coefficients of the detail image that is of a same type and at the previous 
resolution level, associated with the corresponding region of the source image. 

According to an advantageous aspect of the invention, said selection step 
implements a step for the construction of at least one salience map, assigning said 

25 wavelet coefficients a salience value representing its interest. Preferably, a 
salience map is built for each of said resolution levels. 

Advantageously, for each of said salience maps, for each salience value, a 
merging is performed of the pieces of information associated with the three 
wavelet coefficients corresponding to the three detail images so as not to give 

30 preference to any direction in the image. 
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According to a preferred aspect of the invention, a salience value of a 
given wavelet coefficient having a given level of resolution takes account of the 
salience value or values of the descending-order wavelet coefficients in said tree 
structure of said given wavelet coefficient. 

Preferably, a salience value is a linear relationship of the associated 
wavelet coefficients. 

In a particular embodiment of the invention, the salience value of a given 
wavelet coefficient is computed from the following equations: 
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5,., (x,y) = a. 



r \j^Dr. l (x,y)^ 



SJx,y) = 



y 3t^Max(D u 2 . l )) 



oc, 



+ \f i f,S 2/ , l (2x + u,2y + v) 

w=0 v=0 



In these equations, the parameter otk may for example have a value -1/r for 
all the values of k. 

According to another preferred aspect of the invention, said selection step 
15 comprises a step for building a tree structure of said salience values, the step 
advantageously relying on a zerotree type approach. 

In this case, said selection step advantageously comprises the steps of: 
- descending-order sorting of the salience values of the salience map 
corresponding to the minimum resolution; 
20 - selection of the branch having the highest salience value for each of the 
trees thus sorted out. 

According to a preferred aspect of the invention, said step for the selection 
of the branch having the highest salience value implements a corresponding scan 
of the tree starting from its root and a selection, at each level of the tree, of the 
25 offspring node having the highest salience value. 

As already mentioned, the invention enables the use of numerous wavelet 
transformations. One particular embodiment implements the Haar base. 
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One particular embodiment chooses a minimum level of resolution 2" 4 . 
The method of the invention may furthermore include a step for the 
computation of an image signature, from a predetermined number of points of 
interest of said image. 
5 Said signature may thus be used especially to index images by their 

content. 

More generally, the invention can be applied in many fields, and for 
example for: 

- image watermarking; 
10 - image indexing; 

- the detection of faces in an image. 

The invention also relates to devices for the detection of points of interest 
in a source digital image implementing the method as described here above. 

The invention also relates to computer programs comprising program code 
15 instructions for the execution of the steps of the method for the detection of points 
of interest described here above, and the carriers of digital data that can be used 
by a computer carrying such a program. 

Other characteristics and advantages of the invention shall appear from the 
following description of a preferred embodiment, given by way of a simple 
20 illustrative and non-exhaustive example and from the appended drawings, of 
which: 

Figure 1 illustrates the principle of multi-resolution analysis of an 
image I by wavelet transformation; 

Figure 2 presents a schematic view of a wavelet transformation; 
25 - Figure 3 provides a view of a tree structure of wavelet coefficients 

according to the invention; 

Figure 4 presents an example of salience maps and of the 
corresponding salience trees; 

Figure 5 illustrates the salience of a branch of the tree of figure 4 ; 
30 - Figures 6a and 6b illustrate experimental results of the method of 

the invention, Figure 6a showing two original images and Figure 
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6b showing the corresponding salient points; 

Figure 7 illustrates an image indexing method implementing the 

detection method of the invention. 

5, Identification of the essential technical elements of the 
5 invention 

5. 0 General principles 
One aim of the invention therefore is the detection of the salient points of 
an image /. These points correspond to the pixels of / belonging to high-frequency 
regions. This detection is based on wavelet theory [1][2][3]. Appendix A briefly 
10 presents this theory. 

Wavelet transform is a multi-resolution representation of the image 
enabling the image to be expressed at the different resolutions ^,^,etc. . Thus, at 

each level of resolution 2 J (j < -I) , the wavelet transform represents the image /, 
sized nx m = 2 k x 2* (k,l e Z) , in the form: 
15 ♦> a coarse image A^I ; 

❖ a detail image D l jl representing the vertical high frequencies (i.e. the 

horizontal contours); 

❖ a detail image D^I representing the horizontal high frequencies (i.e. the 

vertical contours); 

20 ❖ a detail image D^I representing the diagonal high frequencies (i.e. the 
corners). 

Each of these images is sized 2 k+j x 2 l+J . Figure 1 illustrates this type of 
representation. 

Each of these three images is obtained from A 2 ^I by a filtering followed 

25 by a sub-sampling by a factor of two as shown in figure 2. It must be noted that 
we have A 2 J = /. 

The invention therefore consists of choosing first of all a wavelet base and 
a minimum level of resolution 2 r (r<-\). Once the wavelet transformation has 
been effected, it is proposed to scan each of the three detail images D l r I 9 D\ r I 
30 and D^I 'm order to build a tree structure of wavelet coefficients. This tree 



11 



structure is based on the zerotree approach [4], initially proposed for the image 
encoding. It enables the positioning of a salience map sized 2 k+r x 2* +r reflecting 
the importance of each wavelet coefficient at the resolution 2 r (r<-\). 

Thus a coefficient having significant salience corresponds to a region of / 
5 having high frequencies. Indeed, a wavelet coefficient having a high-value 
modulus at the resolution 2 r (r<-\) corresponds to a contour of the image 
'A r+ J along a particular direction (horizontal, vertical or oblique). The zerotree 

approach tells us that each of the wavelet coefficients at the resolution 
2 r corresponds to a spatial zone sized 2~ r x 2~ r in the image /. 
10 From the built-up salience map, the invention proposes a method for the 

choosing, from among of the 2~ r x2~ r pixels of /, of the pixel that most represents 
this zone. 

In terms of potential applications, the detection of salient points in the 
images may be used non-exhaustively for the following operations: 
15 ❖ Image watermarking: in this case, the salient points give information on 

the possible localization of the mark in order to ensure its robustness; 

❖ Image indexing: in detecting a fixed number of salient points, it possible to 
deduce a signature of the image from it (based for example on colorimetry 
around salient points) which may then be used for the computation of 

20 inter-image similarities; 

❖ Detection of faces: among the salient points corresponding to the high 
frequencies of the image, some are localized on the facial characteristics 
(eyes, nose, mouth) of the faces present in the image. They may then be 
used in a process of detection of faces in the images. 

25 The technique of the invention differs from that proposed by E. Loupias 

and N. Sebe [11]. The main differences are: 

❖ The salient point search algorithm proposed by Loupias and Sebe requires 
a search among 2 2j x4p 2 x 3 coefficients for each level of resolution 

2 J and for a square image. Our algorithm is independent of the size of the 
30 wavelet base carrier, leading to a search from among 
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2 2j x 4 x 3 coefficients. This advantage enables the use of the wavelet bases 
with a carrier that may be large-sized while most of the publications using 
the Loupias and Sebe detector use the Haar base which is far from being 
optimal. 

5 The Loupias and Sebe method considers the subbands 

independently of each other thus leading them to the detection, by priority, 
of the maximum gradient points in every direction (i.e. the corners). For 
our part, we merge the information contained in the different subbands so 
that no preference is given to any particular direction. 
10 5. 1 Wavelet tranformation 

Wavelet transformation is a powerful mathematical tool for the multi- 
resolution analysis of a function [1][2][3]. Appendix A provides a quick overview 
of this tool. 

In the invention, the functions considered are digital images, i.e. discrete 
15 2D functions. Without overlooking general aspects, we assume here that the 
processed images are sampled on a discrete grid of n lines and m columns with 
value range in a sampled luminance space containing 256 values. Furthermore it 
is assumed that n = 2 k (k e Z) and that m = 2 l (leZ). 

If the original image is referenced / , we then have : 
[0,m]x[0,«]--> [0,255] 
(x,y) a I(x 9 y) 



20 /: 



As mentioned in section 4, the wavelet transformation of / enables a multi- 
resolution representation of /. At each level of resolution 2 j (j<-l) J the 
representation of / is given by a coarse image A 2J I and by three detail images 
D X 2J I, Z^ /andD^J. Each of these images is sized 2 k+J x2 l+J . This process is 

25 illustrated in figure 2. 

Wavelet transformation necessitates the choice of a scale function <t>(x) 
as well as the choice of a wavelet function *¥(x) . From these two functions, a 
scale filter H and a wavelet filter G are derived, their respective pulse responses h 
and g being defined by : 
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g(n) = (v 2 Au),<f>(u-nj)Vn e Z. 

Let H and G respectively denote the mirror filters of H and G (i.e. 
h( n) = h(-n) and g(n) = g(-n) ). 

It can then be shown [1] (cf. figure 2) that: 

❖ ^7 can be computed by convoluting ^ 2/+l /with H 'm both dimensions 

and by sub-sampling by a factor of two in both dimensions ; 

❖ D x 2j I can be computed by : 

1. convoluting ^ 2>+l /with //along the direction y and by sub- 
sampling by a factor of two along this same direction ; 

2. convoluting the result of the step 1) with G along the direction x 

and by sub-sampling by a factor of two along this same direction. 

❖ Z> 2 2 , / may be computed by : 

1. convoluting ^4 2>+1 /with G along the direction y and by sub- 
sampling by a factor of two along this same direction; 

2. convoluting the result of the step 1) with H and along the direction 

x and by sub-sampling by a factor of two along this same direction. 

❖ D*jl may be computed by : 

1. convoluting v4 2 , +1 /avec G with G along the direction y and by sub- 
sampling by a factor of two along this same direction; 

2. convoluting the result of the step 1) with G along the direction x 
and by sub-sampling by a factor of two along this same direction. 

5.2 Construction of the tree structure with wavelet coefficients 
Once the wavelet transformation has been made up to the resolution 
2 r (r < -\) , we have available: 

❖ an approximate image A y I ; 

❖ Three detail images D^I , D^I, D\jl per level of resolution 2 y with j=- 
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A tree structure of wavelet coefficients is then built by the zerotree 

technique [4]. The trees are built as follows (cf. figure 3) : 

❖ Each pixel p(x,y) of the image A y I is the root of a tree ; 

❖ Each root p(x t y) is assigned three offspring nodes designated by the 
5 wavelet coefficients of the three detail images £>* r /(s= 1,2,3) localized at 

the same place (x,y) ; 

❖ Owing to the sub-sampling by a factor of two performed by the wavelet 

transformation at each change in resolution, each wavelet coefficient 
a * r (x>y) (s=l,2,3) corresponds to a zone sized 2x2pixels in the detail 

10 image corresponding to the resolution 2 r+I . This zone is localized at 

(2x,2y) and all the wavelet coefficients belonging to it become the 
offspring nodes of a' r (x 9 y) . 

Recursively, the tree structure is constructed wherein each wavelet 
coefficient a' r (x,y) (s=l 9 2 9 3 and 0>u>r) possesses four offspring nodes 

15 designated by wavelet coefficients of the image D s r+l I localized in the region 

situated in (2x,2y) and sized 2x2pixels. 

Once the tree structure is constructed, each wavelet coefficient 
<x 5 2 r(x,y)(s=l,2, 3) corresponds to a region sized 2~ r x2~ r pixels in the detail 

image D s r J. 

20 5.3 Construction of the salience maps 

Starting from the tree structure obtained by the preceding step, we propose 

to build a set of-r salience maps (i.e. one salience map per level of resolution). 
Each salience map S 2J (j=-l,...,r) reflects the importance of the wavelet 

coefficients present at the corresponding resolution 2 J . Thus, the more a wavelet 
25 coefficient will be deemed to be important with respect to the information that it 

conveys, the greater will be its salience value. 

It must be noted that each wavelet coefficient gives preference to one 

direction (horizontal, vertical or oblique) depending on the detail image to which 

it belongs. However, we have chosen to favor no particular direction and have 
30 therefore merged the information contained in the three wavelet coefficients 
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&lj(x,y),a*j(x,y),alj(x,y) whatever the level of resolution V and whatever 

the localization (x,y) with 0 < x < 2 k+J and 0 < y < 2 UJ . 
Each salience map S 2j is sized 2 k+J x 2 /+y . 

Furthermore, the salience of each coefficient with the resolution 2 J must 
take account of the salience of its offspring in the tree structure of the coefficients. 

In order to take account of all these properties, the salience of a coefficient 
localized at (x,y) with the resolution 2 J is given by the following recursive 
relationship: 
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S->-\(x,y) = a , -> — 

2 " \3t{Max(D^)) 



( ( 



\ltZMax(D" 2i )) 



u=0 v=0 



Equation 1: expression of the salience of a coefficient 

Where: 

❖ Max(D s 2J ) (s= 1,2,3) denotes the maximum value of the wavelet 
coefficients in the detail image D^I ; 

❖ a k (0 < a k < 1 ^)is used to set the size of the salience coefficients according 
to the resolution level. It must be noted that we have = 1 . 



❖ It must be noted that the salience values are standardized i.e. 

0<S 2j (x,y)<l. 

As can be seen in the Equation 1, the salience of a coefficient is a linear 

20 relationship of the wavelet coefficients. Indeed, as mentioned in section 4, we 

consider the salient points to be pixels of the image belonging to high-frequency 
regions. Now, a high wavelet coefficient a'^jc,^ (8=1,2,3) at the resolution 

2 J denotes a high-frequency zone in the image A 2 ^J with the localization (2x,2y). 

Indeed, the detail images are obtained by a high-pass filtering of the image 
25 A lh J , each contour of A lhX l generates an elevated wavelet coefficient in one of 

the detail images with the resolution 2 7 , this coefficient corresponding to the 
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orientation of the contour. 

Thus, the formulation of the salience of a given image in the Equation 1 is 
warranted. 

5.4 Choice of the salient points 

Once the construction of the salience maps is completed, we propose a 
method in order to choose the most salient points in the original image. 

To do this, we build a tree structure of the salience values from the -r built- 
up salience maps. In a manner similar to the building of the tree structure of the 
wavelet coefficients, we can build 2* +/+2r trees of salience coefficients, each 
having a coefficient of S y as its root. As in the case of the zerotree technique, 

each of these coefficients corresponds to a zone sized 2x2 coefficients in the 
cardS^, . It is then possible to recursively construct the tree in which each node is 

assigned four offspring in the salience map having immediately higher resolution. 
Figure 4 illustrates this construction. 

In order to localize the most salient points in /, we carry out: 

1. A descending-order sorting of the 2 k+,+2r salience values present in S ; 

2. Te selection of the maximum salience branch of each of the 2* +/+2r trees 
thus sorted out. 

In order to select this branch, it is proposed to scan the tree from the root. 
During this scan a selection is made, at each level of the tree, of the offspring 
node having the greatest salience value (cf. figure 5). We thus obtain a list of -r 
salience values: 

with (x k ,y k ) = Arg Max {s rHk . 2) {2x k _ x + u,2y k _ x + v),0 < u < 1,0 < v < l}. 

From the most salient branches of each tree, the pixel of / chosen as being 
the most representative pixel of the branch is localized at (2x_ r ,2y_ r ) . 
In practice, only a subset of the 2*+/+2r trees is scanned. Indeed, for many 

applications, a search is made for a fixed number n of salient points. In this case, 
it is appropriate to scan only the n trees having the most salient roots. 
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6. Detailed description of at least one particular embodiment 

In this section, we use the technical elements presented in the previous 
section for which we set the necessary parameters in order to describe a particular 
embodiment. 

6. 1 Choice of wavelet transformation 

As mentioned in section 5.1, we must first of all choose a wavelet base and 
a minimum resolution level 2 r (r<-\). 

For this particular embodiment, we propose to use the Haar base and r = -4. 
The Haar base is defined by: 

^)=i Hfo " <i 

YK ' [elseO 



for the scale function, and by: 



if/(x) = 



lif 0<*<- 
2 

-lif-<x<l 
2 

elseO 



for the wavelet function. 

6.2 Construction of the tree structure of the wavelet coefficients 

In this step, no parameter whatsoever is required. The process is therefore 
compliant with what is described in section 5.1. 

6.3 Construction of the salience maps 

In this step, we must choose the parameters a k (-\>k>r) used to adjust 

the importance given to the salience coefficients according to the level of 
resolution to which they belong. 

In this particular embodiment, we propose to usea. = — V£ e fr,-/l. 

r 
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6.4 Choice of the salient points 

This step requires no parameter. The process is therefore compliant with 
what is described in section 5.4. 

6.5 Experimental results 

5 The results obtained on natural images by using the parameters proposed 

in this particular embodiment are illustrated in figure 6. 

6. 6 Example of application 

Among the potential applications listed in the section 4, this section 
presents the use of salient points for the indexing of images fixed by the content. 

10 6. 6. 1 Purpose of imase indexing 

Image indexing by content enables the retrieval, from an image database, 
of a set of images visually similar to a given image called a request image. To do 
this, visual characteristics (also called descriptors) are extracted from the images 
and form the signature of the image. 

15 The signatures of the images belonging to the database are computed off- 

line and are stored in the database. When the user frequently submits a request 
image to the indexing engine, the engine computes the signature of the request 
image and cross-checks this signature with the pre-computed signatures of the 
database. 

20 This cross-checking is made by computing the distance between the 

signature of the request image and the signatures of the database. The images 
most similar to the request image are then those whose signature minimizes the 
computed distance. Figure 7 illustrates this method. 

The difficulty of image indexing then lies entirely in determining 

25 descriptors and robust distances. 

6.6.2 Descriptors based on the salient points of an image 
In this section, we propose to compute the signature of an image from a 
fixed number of salient points. This approach draws inspiration from [9]. 

A colorimetrical descriptor and texture descriptor are extracted at the 
30 vicinity of each of the salient points. The colorimetrical descriptor is constituted 
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by the 0 order (mean), 1 st order (variance) and 2 nd order moments in a 
neighborhood sized 3x3 around each salient point. The texture descriptor is 
constituted by the Gabor moments in a neighborhood sized 9x9. 

Once the signature of the request image R has been computed, the distance 
D(RJj) between this signature and the signature of the j th image Ij in the database 
is defined by: 

D(R,I J )^W t S/f l ),j = l,...,N 

i 

where N denotes the number of images in the database and S/fi) is defined by: 

S/f i ) = (x i -q i ) T (x i -q i ) 

where xi and q t respectively designate the i th descriptor (for example i=l for the 
colorimetrical descriptor and i=2 for the texture descriptor) of the f h image of the 
base and of the request R. The weights Wj make it possible to modulate the 
importance of the descriptors relative to each other. 
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Appendix A: an overview of the theory of wavelets 
A.1 Introduction 

Wavelet theory [1][2][3] enables the approximation of a function (a curve, 
surface, etc.) at different resolution levels. Thus, this theory enables a function to 
be described in the form of a coarse approximation and of a series of details 
enabling the perfect reconstruction of the original function. 

Such a multi-resolution representation [1] of a function therefore enables 
the hierarchical interpretation of the information contained in the function. To do 
this, the information is reorganized into a set of details appearing at different 

resolution levels. Starting from a sequence of resolution levels in ascending order 
( r j)jez> the details of a function at the resolution level r j are defined as the 
difference of information between its approximation at the resolution r } and its 
approximation at the resolution r j+} . 

A.2 Notation 

Before presenting the bases of multi-resolution analysis in greater detail, in 
this section we shall present the notation that will be used in the document. 

❖ The sets of integers and real numbers are respectively 
referenced Z and R. 

❖ L 2 (R) denotes the vector space of the measurable and 

integrable ID functions fix). 

❖ For f(x) e L 2 (R) and g(x) e L 2 (R) , the scalar product of 

f[x) and g(x) is defined by: 

(f(x),g(x))=&(u)g(u)du. 

❖ For f(x) e L 2 (R)et g(x) e L 2 (R), the convolution off(x) 
and g(x) is defined by: 

/ * g(x) = £f(u)g(x - u)du . 

❖ L 2 (R 2 ) denotes the vector space of the functions f[x t y) of 

two measurable and integrable variables. 

❖ For f(x,y)eL 2 (R 2 )m& g(x,y) e L 2 (R 2 ), the scalar 
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product of f(x,y) and g(x,y) is defined by: 

(f(x>y),g(x,y))= £ £f(u,v)g(u,v)dudv. 

A.2 Properties of multi-resolution analysis 

This section intuitively presents the desired properties of the operator 

enabling the multi-resolution analysis of a function. These properties come from 

Pi- 
Let ^ 2 ,be the operator that approximates a function f (x) e L 2 (R) with 

the resolution 2 J (j > 0) (i.e./fo) is defined by 2 J samples). 

The following are the properties expected from^, : 

L ^ 2 >is a linear operator. If A 2j f(x) represents the 

approximation of fix) with the resolution 2\ then A^f(x) should not be 

modified when it is again approximated at the resolution 2 j . This 
principle is written as A 2J . o A 2i = A v and shows that the operator A %i is a 
projection vector in a vector space V v <zL 2 (R). This vector space may be 

interpreted as the set of all the possible approximations at the resolution 
2 J of the functions of L 2 (R). 

2. Among all the possible approximations of f[x) with the 
resolution 2 7 , A^ffx) is the most similar to f[x). The operator 4^ is 
therefore an orthogonal projection on V 2J . 

3. The approximation of a function at the resolution 
2 J+l contains all the information necessary to compute the same function at 
the lower resolution V . This property of causality induces the following 
relationship: 

4. The operation of approximation is the same at all values of 
resolution. The spaces of the approximation function may be derived from 
one another by a change of scale corresponding to the difference of 
resolution. 
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\/jeZ,f(x)eV v of(2x)eV 2j+l . 
5. When an approximation of fix) at the resolution 2 J \ is 
computed, a part of the information contained in fix) is lost. However, 
when the resolution tends toward infinity, the approximate function must 
converge on the original function fix). In the same way, when the 
resolution tends toward zero, the approximate function contains less 
information and must converge on zero. 

Any vector space (V^ ) jeZ that complies with all these properties is called 
the multi-resolution approximation de L 2 (R). 

A3 Multi-resolution analysis of a 1D function 
A.3. 1 Search for a base of V , 

We have seen in section A.2 that the approximation operator A v is an 
orthogonal projection in the vector space V v . In order to numerically characterize 
this operator, we must find an orthonormal base of V . 

V 2J being a vector space containing the approximations of functions of 
L 2 (R) with the resolution 2\ any function f(x) e V 2J may be seen as a vector 
with V components. We must therefore find V base functions. 

One of the main theorems of the theory of wavelets stipulates that there is 
a single function <t>( x) e L 2 (R) , called a scale function, from which it is possible 
to define 2 J base functions <t>i(x) de V 2j by expansion and translation of <t>(x ) : 

o; ( x) = <t>(2 J x - / = 0, - - - ,2 J - 1. 

Approximating a function f(x)eL 2 (R)at the resolution 2 j therefore 
amounts to making an orthogonal projection fix) on the 2 y basic functions O/fjc ; . 

This operation consists in computing the scalar product of fix) with each of the 
2 s basic functions ^(x) : 

^f(x) = k ^f(u)M(u)}®i(x) 

k-0 

= lf(u), ®(2 J u - k) W2> u - k). 

k=0 

It can be shown [1] that A 2 J(x) may be reduced to the convolution of 
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fix) with the low-pass filter d>(x) , assessed at the point k : 

A 2j f = (f(u) * Q>(-2 s u))(k),k e Z. 
Since Of*; is a low-pass filter, A v f may be interpreted as a low-pass 
filtering followed by a uniform sub-sampling. 

A.3.2 Construction of the multi-resolution analysis 

In practice, the functions / to be approximated (signal, image, etc.) are 
discrete. Let it be assumed that the original function fix) is defined on 
n = 2* (k e Z) samples. The maximum resolution of fix) is then n. 

Let A n fbe the discrete approximation of fix) at the resolution level n. 

According to the property of causality, it is claimed (cf. section A.2) that 
A 2 jf can be computed from A n f for every value of j < k . 

Indeed, in computing the projection of the 2 y basic functions Q> s i (x)of 
V 2 j on V 2 j +l , it can be shown that A 2i f can be obtained by convoluting A^«f with 

the low-pass filter corresponding to the scale function and by sub-sampling the 
result by a factor of 2: 

2^-1 

A 2J f(u) = Y 4 h(k-2u)A^f(k)fi <u<V-\ 
with h(n) = (®(2u),<J>(u-n))yneZ . 

A.3.3 The detail function 

As mentioned in the property (5) of section A.3, the operation which 

consists in approximating a function fix) at the resolution 2 J on the basis of an 

approximation at the resolution 2 J+] causes a loss of information. 

This loss of information is contained in a function called a detail function 
at resolution level 2 j and referenced D 2j f. It must be noted that knowledge of 

D 2j f and A 2i f enables the perfect reconstruction of the approximate function 

The detail function at the resolution level 2 J is obtained by projecting the 
original function fix) orthogonally on the orthogonal complement of V 2i in . 

Let W 2 j be this vector space. 
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To calculate this projection numerically, we need to find an orthonormal 
base of W 2 , i.e. 2' base functions. Another important theorem of the wavelet 

theory stipulates that, through a scale function ®(x), it is possible to define 2 j 
base functions of W 2 , . These base functions X(x) are obtained by expansion 
and translation of a function Vfx) called a wavelet function: 

% J (x) = V(2 J x-i),i = 0,-,2'-\. 
In the same way as for the construction of the approximation A 2 ,f , it can 
be shown that D 2J f can be obtained by a convolution of the original function fix) 
with the high-pass filter *¥(x) followed by a sub-sampling by a factor of 2 j : 

D 2i f = (f(u) * V(-2 J u))(k), keZ. 

A.4.5 Extension to the multi-resolution analysis of 2D 
functions 

This section presents the manner of extending multi-resolution analysis by 
wavelets to the functions of L 2 (R 2 ) such as images. 

This is done by using the same theorems as the ones used earlier. Thus if 
V 2i denotes the vector space of the approximations of L 2 (R 2 ) at the resolution 

27 , it can be shown that it is possible to find an orthonormal base of V 2l by 
expanding and translating a scale function 0(x,y ) e L 2 (R 2 ) : 

&,(x,y) = <t>(2 J x - i,2 J y - j), (i,j) e Z\ 
In the particular case of the separable approximations of L 2 (R 2 ), we have 
<D^,^; = <D^;<D^; where 0( ^ is a sca , e function of l 2 (R). In this case, the 
multi-resolution analysis of a function of L 2 (R 2 ) is done by the sequential and 
separable processing of each of the dimensions x and y. 

As in the ID case, the detail function at the resolution 2 7 j s obtained by an 
orthogonal projection of f[x,y) on the complement of V 2 , in V^, , written as W 2l . 

In the 2D case, it can be shown that if W(x) denotes the wavelet function 
associated with the scale function O(x), then the three functions defined by: 
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V 2 (x,y) = V(xWy) 
*¥>(x,y) = V(x)V(y) 

are wavelet functions of L 2 (R 2 ) . Expanding and translating these three 
wavelet functions gives an orthonormal base of W j : 

(x,y) = O x V(2 J x-k,2 J y-l) 
V 2 (x,y) = W(2 J x-k,2 J y-l) 
*¥ 3 (x,y) = Vr¥(V x-h,2 J y-l). 

5 The projection of f(x,y) on these three base functions of the base of W 2i 

gives three detail functions: 

D l 2J f = f(x,y)*<!> J (-x)V J (-y) 
D 2 2 j f = f(x,y)* x ¥ j (-x)Oj (-y) 
D\ i f = f(x,y)*V ) (-x)V j (-y) 
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